Skip to content

Write chunks with negative zero values and a zero fill value #3216

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Aug 6, 2025

Conversation

bojidar-bg
Copy link
Contributor

@bojidar-bg bojidar-bg commented Jul 8, 2025

Fixes #3144.

Using np.any(self._data) was inspired by how Zarr v2 checks for equality with a falsey fill value.

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/user-guide/*.rst
  • Changes documented as a new file in changes/
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

@github-actions github-actions bot added the needs release notes Automatically applied to PRs which haven't added release notes label Jul 8, 2025
@bojidar-bg bojidar-bg force-pushed the 3144-negative-zero branch from d4c1205 to 2745b68 Compare July 8, 2025 11:48
@github-actions github-actions bot removed the needs release notes Automatically applied to PRs which haven't added release notes label Jul 8, 2025
@bojidar-bg bojidar-bg force-pushed the 3144-negative-zero branch from 2745b68 to 7d6d74b Compare July 8, 2025 11:49
@bojidar-bg
Copy link
Contributor Author

Oh, oops, thanks 😅

@bojidar-bg bojidar-bg force-pushed the 3144-negative-zero branch from 7d6d74b to c4904e3 Compare July 8, 2025 11:53
@d-v-b
Copy link
Contributor

d-v-b commented Jul 11, 2025

this test failure seems significant: https://github.com/zarr-developers/zarr-python/actions/runs/16172926021/job/45650861381?pr=3216#step:8:420

@dcherian
Copy link
Contributor

this test failure seems significant

Yes looks like this approach doesn't work for complex number types

@d-v-b
Copy link
Contributor

d-v-b commented Jul 11, 2025

what if we view the array as raw bytes (should be cheap) and compare the raw bytes?

>>> import numpy as np
>>> np.array([0.0]) == np.array([-0.0])
array([ True])
>>> np.array([0.0]).view('V') == np.array([-0.0]).view('V')
array([False])

@bojidar-bg
Copy link
Contributor Author

I wonder if that would somehow break with floating point subnormal-s and the like. Will have to experiment 🤔

@dstansby dstansby added this to the 3.1.2 milestone Jul 31, 2025
Co-authored-by: Davis Bennett <davis.v.bennett@gmail.com>
@bojidar-bg
Copy link
Contributor Author

Took me a bit, but finally got around to it. Subnormals are fine, and behave as expected; the only difference between Python's float equality and bitwise float equality is that signed zeroes compare as un-equal when comparing their bits, and that nan numbers can sometimes compare as equal when comparing their bits; the former is exactly what we want, and the latter won't occur since the code path is triggered only for signed zero fill values.

>>> import numpy as np
>>> np.array(1e-323).view('V') == np.array(0.0).view('V'), 1e-323 == 0.0
(array(False), False)
>>> np.array(1e-324).view('V') == np.array(0.0).view('V'), 1e-324 == 0.0
(array(True), True)
>>> np.array(-1e-323).view('V') == np.array(-0.0).view('V'), -1e-323 == -0.0
(array(False), False)
>>> np.array(-1e-324).view('V') == np.array(-0.0).view('V'), -1e-324 == -0.0
(array(True), True)
>>> np.array(-0.0).view('V') == np.array(0.0).view('V'), 0.0 == -0.0
(array(False), True)
>>> np.inf * 0.0
nan
>>> np.array(np.nan).view('V') == np.array(np.nan).view('V'), np.nan == np.nan
(array(True), False)
>>> np.array(np.inf * 0.0).view('V') == np.array(np.nan).view('V'), np.inf * 0.0 == np.nan
(array(False), False)

@d-v-b
Copy link
Contributor

d-v-b commented Aug 1, 2025

nan numbers can sometimes compare as equal when comparing their bits

This is actually potentially super useful, because the zarr v3 spec distinguishes between different types of nans, even though numpy does not. In order to ensure that arrays round-trip correctly through zarr python, we need to generate exactly the specific nan defined in the metadata. I did a quick check and numpy will preserve the underlying byte representation of different nans, so this should be possible.

np.array([b'\x00\x00\x00\x00\x00\x00\xFF\xFF'], dtype='|V8').view('float').view('V')
array([b'\x00\x00\x00\x00\x00\x00\xFF\xFF'], dtype='|V8')

@bojidar-bg
Copy link
Contributor Author

Oh, that's curious! Probably not something I can quite incorporate in the code here... unless we make all floating point arrays use bitwise comparison for empty chunks.. 🤔

@bojidar-bg
Copy link
Contributor Author

bojidar-bg commented Aug 1, 2025

That .view("V") trick fails on GPU with a ZeroDivisionError, presumably in cupy:_core/core.pyx:81 when v_is == dtype.itemsize == 0, as is the case for the "V" dtype...
Use structured np.void dtypes to achieve the same idea doesn't work because np.void isn't hashable, but it works when using a sized "V" dtype, like "V16". Let's see if that's enough to get all tests green 😂 YES IT IS! 🎉 That GPU test was stubborn! 😂😂

Copy link

codecov bot commented Aug 1, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 94.54%. Comparing base (f087c56) to head (58f45b7).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #3216   +/-   ##
=======================================
  Coverage   94.54%   94.54%           
=======================================
  Files          78       78           
  Lines        9419     9423    +4     
=======================================
+ Hits         8905     8909    +4     
  Misses        514      514           
Files with missing lines Coverage Δ
src/zarr/core/buffer/core.py 83.09% <100.00%> (+0.48%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.


# initialize the with the negated fill value (-0.0 for +0.0, +0.0 for -0.0)
arr[:] = -fill_value
assert arr.nchunks_initialized == arr.nchunks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test is fine but ideally we would be testing the altered function explicitly, instead of indirectly via array creation + chunk writing. this is not a blocker for this PR, just something to sort out down the road

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That test is basically copied from test_write_empty_chunks_behavior right above it. But yeah, it might be worth to have both a unit and an integration test in this case (:

Copy link
Contributor

@d-v-b d-v-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for this fix @bojidar-bg!

@d-v-b d-v-b merged commit 1264a4d into zarr-developers:main Aug 6, 2025
31 checks passed
meeseeksmachine pushed a commit to meeseeksmachine/zarr-python that referenced this pull request Aug 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Negative zero not preserved
4 participants